AITopics

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Akshay Krishnamurthy, Alekh Agarwal, John Langford

PAC Reinforcement Learning with Rich Observations

Neural Information Processing SystemsMar-23-2026, 03:38:36 GMT

We propose and study a new model for reinforcement learning with rich observations, generalizing contextual bandits to sequential decision making. These models require an agent to take actions based on observations (features) with the goal of achieving long-term performance competitive with a large set of policies. To avoid barriers to sample-efficient learning associated with large observation spaces and general POMDPs, we focus on problems that can be summarized by a small number of hidden states and have long-term rewards that are predictable by a reactive function class. In this setting, we design and analyze a new reinforcement learning algorithm, Least Squares Value Elimination by Exploration. We prove that the algorithm learns near optimal behavior after a number of episodes that is polynomial in all relevant parameters, logarithmic in the number of policies, and independent of the size of the observation space. Our result provides theoretical justification for reinforcement learning with function approximation.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Country: North America > United States > Massachusetts (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.92)

Neural Information Processing SystemsNov-20-2025, 14:38:08 GMT

Dual Policy Iteration

Wen Sun, Geoffrey J. Gordon, Byron Boots, J. Bagnell

We also provide a general convergence analysis to support our empirical findings. Although our analysis is similar to CPI's, it has a key difference: as long as MBOC succeeds, we can provide a larger policy improvement than CPI at each iteration.

artificial intelligence, machine learning, reactive policy, (15 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New Jersey (0.04)
North America > Canada > Quebec > Montreal (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Sakcak, Basak, Weinstein, Vadim K., Timperi, Kalle G., LaValle, Steven M.

Minimally sufficient structures for information-feedback policies

arXiv.org Artificial IntelligenceFeb-19-2025

In this paper, we consider robotic tasks which require a desirable outcome to be achieved in the physical world that the robot is embedded in and interacting with. Accomplishing this objective requires designing a filter that maintains a useful representation of the physical world and a policy over the filter states. A filter is seen as the robot's perspective of the physical world based on limited sensing, memory, and computation and it is represented as a transition system over a space of information states. To this end, the interactions result from the coupling of an internal and an external system, a filter, and the physical world, respectively, through a sensor mapping and an information-feedback policy. Within this setup, we look for sufficient structures, that is, sufficient internal systems and sensors, for accomplishing a given task. We establish necessary and sufficient conditions for these structures to satisfy for information-feedback policies that can be defined over the states of an internal system to exist. We also show that under mild assumptions, minimal internal systems that can represent a particular plan/policy described over the action-observation histories exist and are unique. Finally, the results are applied to determine sufficient structures for distance-optimal navigation in a polygonal environment.

hist, history, sufficient structure, (17 more...)

2502.13852

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Finland > Northern Ostrobothnia > Oulu (0.04)
Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Neural Information Processing SystemsMar-12-2024, 17:12:52 GMT

Value Iteration Networks

We introduce the value iteration network (VIN): a fully differentiable neural network with a'planning module' embedded within. VINs can learn to plan, and are suitable for predicting outcomes that involve planning-based reasoning, such as policies for reinforcement learning. Key to our approach is a novel differentiable approximation of the value-iteration algorithm, which can be represented as a convolutional neural network, and trained end-to-end using standard backpropagation. We evaluate VIN based policies on discrete and continuous path-planning domains, and on a natural-language based search task. We show that by learning an explicit planning computation, VIN policies generalize better to new, unseen domains.

algorithm, planning computation, reactive policy, (16 more...)

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsMar-12-2024, 08:45:54 GMT

PAC Reinforcement Learning with Rich Observations

We propose and study a new model for reinforcement learning with rich observations, generalizing contextual bandits to sequential decision making. These models require an agent to take actions based on observations (features) with the goal of achieving long-term performance competitive with a large set of policies. To avoid barriers to sample-efficient learning associated with large observation spaces and general POMDPs, we focus on problems that can be summarized by a small number of hidden states and have long-term rewards that are predictable by a reactive function class. In this setting, we design and analyze a new reinforcement learning algorithm, Least Squares Value Elimination by Exploration. We prove that the algorithm learns near optimal behavior after a number of episodes that is polynomial in all relevant parameters, logarithmic in the number of policies, and independent of the size of the observation space. Our result provides theoretical justification for reinforcement learning with function approximation.

algorithm, reinforcement, sample complexity, (13 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.92)

Kohler, Hector, Akrour, Riad, Preux, Philippe

Limits of Actor-Critic Algorithms for Decision Tree Policies Learning in IBMDPs

arXiv.org Artificial IntelligenceJan-21-2024

Interpretability of AI models allows for user safety checks to build trust in such AIs. In particular, Decision Trees (DTs) provide a global look at the learned model and transparently reveal which features of the input are critical for making a decision. However, interpretability is hindered if the DT is too large. To learn compact trees, a recent Reinforcement Learning (RL) framework has been proposed to explore the space of DTs using deep RL. This framework augments a decision problem (e.g. a supervised classification task) with additional actions that gather information about the features of an otherwise hidden input. By appropriately penalizing these actions, the agent learns to optimally trade-off size and performance of DTs. In practice, a reactive policy for a partially observable Markov decision process (MDP) needs to be learned, which is still an open problem. We show in this paper that deep RL can fail even on simple toy tasks of this class. However, when the underlying decision problem is a supervised classification task, we show that finding the optimal tree can be cast as a fully observable Markov decision problem and be solved efficiently, giving rise to a new family of algorithms for learning DTs that go beyond the classical greedy maximization ones.

algorithm, decision tree policy learning, ibmdp, (13 more...)

2309.13365

Country:

Asia > Middle East > Jordan (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Hansel, Kay, Urain, Julen, Peters, Jan, Chalvatzaki, Georgia

Hierarchical Policy Blending as Inference for Reactive Robot Control

arXiv.org Artificial IntelligenceMay-31-2023

Motion generation in cluttered, dense, and dynamic environments is a central topic in robotics, rendered as a multi-objective decision-making problem. Current approaches trade-off between safety and performance. On the one hand, reactive policies guarantee fast response to environmental changes at the risk of suboptimal behavior. On the other hand, planning-based motion generation provides feasible trajectories, but the high computational cost may limit the control frequency and thus safety. To combine the benefits of reactive policies and planning, we propose a hierarchical motion generation method. Moreover, we adopt probabilistic inference methods to formalize the hierarchical model and stochastic optimization. We realize this approach as a weighted product of stochastic, reactive expert policies, where planning is used to adaptively compute the optimal weights over the task horizon. This stochastic optimization avoids local optima and proposes feasible reactive plans that find paths in cluttered and dense environments. Our extensive experimental study in planar navigation and 6DoF manipulation shows that our proposed hierarchical motion generation method outperforms both myopic reactive controllers and online re-planning methods.

artificial intelligence, hipbi, machine learning, (15 more...)

2210.0789

Country:

Europe > Germany (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Energy > Oil & Gas (0.30)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)

AAAI ConferencesFeb-8-2022, 11:28:21 GMT

Issakkimuthu

State-of-the-art probabilistic planners typically apply look- ahead search and reasoning at each step to make a decision. While this approach can enable high-quality decisions, it can be computationally expensive for problems that require fast decision making. In this paper, we investigate the potential for deep learning to replace search by fast reactive policies. We focus on supervised learning of deep reactive policies for probabilistic planning problems described in RDDL. A key challenge is to explore the large design space of network architectures and training methods, which was critical to prior deep learning successes. We investigate a number of choices in this space and conduct experiments across a set of benchmark problems. Our results show that effective deep reactive policies can be learned for many benchmark problems and that leveraging the planning problem description to define the network structure can be beneficial.

deep reactive policy, issakkimuthu, reactive policy, (2 more...)

AAAI Conferences

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.58)

Lyu, Xueguang, Baisero, Andrea, Xiao, Yuchen, Amato, Christopher

A Deeper Understanding of State-Based Critics in Multi-Agent Reinforcement Learning

arXiv.org Artificial IntelligenceJan-3-2022

Centralized Training for Decentralized Execution, where training is done in a centralized offline fashion, has become a popular solution paradigm in Multi-Agent Reinforcement Learning. Many such methods take the form of actor-critic with state-based critics, since centralized training allows access to the true system state, which can be useful during training despite not being available at execution time. State-based critics have become a common empirical choice, albeit one which has had limited theoretical justification or analysis. In this paper, we show that state-based critics can introduce bias in the policy gradient estimates, potentially undermining the asymptotic guarantees of the algorithm. We also show that, even if the state-based critics do not introduce any bias, they can still result in a larger gradient variance, contrary to the common intuition. Finally, we show the effects of the theories in practice by comparing different forms of centralized critics on a wide range of common benchmarks, and detail how various environmental properties are related to the effectiveness of different types of critics.

history, information, state-based critic, (15 more...)